在本文中,我们提出了用于滚动快门摄像机的概率连续时间视觉惯性频道(VIO)。连续的时轨迹公式自然促进异步高频IMU数据和运动延伸的滚动快门图像的融合。为了防止棘手的计算负载,提出的VIO是滑动窗口和基于密钥帧的。我们建议概率地将控制点边缘化,以保持滑动窗口中恒定的密钥帧数。此外,可以在我们的连续时间VIO中在线校准滚动快门相机的线曝光时间差(线延迟)。为了广泛检查我们的连续时间VIO的性能,对公共可用的WHU-RSVI,TUM-RSVI和Sensetime-RSVI Rolling快门数据集进行了实验。结果表明,提出的连续时间VIO显着优于现有的最新VIO方法。本文的代码库也将通过\ url {https://github.com/april-zju/ctrl-vio}开源。
translated by 谷歌翻译
随着自我监督学习(SSL)的成功,它已成为一种主流范式,可以从自我监督预定的预计模型中进行微调以提高下游任务的性能。但是,我们发现当前的SSL模型在执行低位量化时遭受严重的准确性下降,禁止其在资源受限应用程序中的部署。在本文中,我们提出了一种称为协同自我监督和量化学习(SSQL)的方法,以预处理量化量化的自我监督模型,从而有助于下游部署。 SSQL以自我监督的方式对比量化和完整的精度模型的特征,在每个步骤中随机选择了量化模型的位宽度。 SSQL不仅在量化较低的位宽度时显着提高了准确性,而且在大多数情况下都提高了完整精度模型的准确性。通过仅培训一次,SSQL可以同时在不同的位宽度上受益于各种下游任务。此外,在没有额外的存储开销的情况下,可以实现位宽度的灵活性,在训练和推理过程中只需要一份重量。我们理论上分析了SSQL的优化过程,并在各种基准测试中进行详尽的实验,以进一步证明我们方法的有效性。我们的代码可从https://github.com/megvii-research/ssql-eccv2022获得。
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译
Federated learning achieves joint training of deep models by connecting decentralized data sources, which can significantly mitigate the risk of privacy leakage. However, in a more general case, the distributions of labels among clients are different, called ``label distribution skew''. Directly applying conventional federated learning without consideration of label distribution skew issue significantly hurts the performance of the global model. To this end, we propose a novel federated learning method, named FedMGD, to alleviate the performance degradation caused by the label distribution skew issue. It introduces a global Generative Adversarial Network to model the global data distribution without access to local datasets, so the global model can be trained using the global information of data distribution without privacy leakage. The experimental results demonstrate that our proposed method significantly outperforms the state-of-the-art on several public benchmarks. Code is available at \url{https://github.com/Sheng-T/FedMGD}.
translated by 谷歌翻译
Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys
translated by 谷歌翻译
Generative modeling of human motion has broad applications in computer animation, virtual reality, and robotics. Conventional approaches develop separate models for different motion synthesis tasks, and typically use a model of a small size to avoid overfitting the scarce data available in each setting. It remains an open question whether developing a single unified model is feasible, which may 1) benefit the acquirement of novel skills by combining skills learned from multiple tasks, and 2) help in increasing the model capacity without overfitting by combining multiple data sources. Unification is challenging because 1) it involves diverse control signals as well as targets of varying granularity, and 2) motion datasets may use different skeletons and default poses. In this paper, we present MoFusion, a framework for unified motion synthesis. MoFusion employs a Transformer backbone to ease the inclusion of diverse control signals via cross attention, and pretrains the backbone as a diffusion model to support multi-granularity synthesis ranging from motion completion of a body part to whole-body motion generation. It uses a learnable adapter to accommodate the differences between the default skeletons used by the pretraining and the fine-tuning data. Empirical results show that pretraining is vital for scaling the model size without overfitting, and demonstrate MoFusion's potential in various tasks, e.g., text-to-motion, motion completion, and zero-shot mixing of multiple control signals. Project page: \url{https://ofa-sys.github.io/MoFusion/}.
translated by 谷歌翻译
Text-driven person image generation is an emerging and challenging task in cross-modality image generation. Controllable person image generation promotes a wide range of applications such as digital human interaction and virtual try-on. However, previous methods mostly employ single-modality information as the prior condition (e.g. pose-guided person image generation), or utilize the preset words for text-driven human synthesis. Introducing a sentence composed of free words with an editable semantic pose map to describe person appearance is a more user-friendly way. In this paper, we propose HumanDiffusion, a coarse-to-fine alignment diffusion framework, for text-driven person image generation. Specifically, two collaborative modules are proposed, the Stylized Memory Retrieval (SMR) module for fine-grained feature distillation in data processing and the Multi-scale Cross-modality Alignment (MCA) module for coarse-to-fine feature alignment in diffusion. These two modules guarantee the alignment quality of the text and image, from image-level to feature-level, from low-resolution to high-resolution. As a result, HumanDiffusion realizes open-vocabulary person image generation with desired semantic poses. Extensive experiments conducted on DeepFashion demonstrate the superiority of our method compared with previous approaches. Moreover, better results could be obtained for complicated person images with various details and uncommon poses.
translated by 谷歌翻译
In this work, we focus on the problem of safe policy transfer in reinforcement learning: we seek to leverage existing policies when learning a new task with specified constraints. This problem is important for safety-critical applications where interactions are costly and unconstrained policies can lead to undesirable or dangerous outcomes, e.g., with physical robots that interact with humans. We propose a Constrained Markov Decision Process (CMDP) formulation that simultaneously enables the transfer of policies and adherence to safety constraints. Our formulation cleanly separates task goals from safety considerations and permits the specification of a wide variety of constraints. Our approach relies on a novel extension of generalized policy improvement to constrained settings via a Lagrangian formulation. We devise a dual optimization algorithm that estimates the optimal dual variable of a target task, thus enabling safe transfer of policies derived from successor features learned on source tasks. Our experiments in simulated domains show that our approach is effective; it visits unsafe states less frequently and outperforms alternative state-of-the-art methods when taking safety constraints into account.
translated by 谷歌翻译
本文提出了秤,这是一个一般框架,将公平原则转化为基于约束马尔可夫决策过程(CMDP)的共同表示。借助因果语言,我们的框架可以在决策过程(程序公平)以及决策(结果公平)产生的结果上构成限制。具体而言,我们表明可以将众所周知的公平原理编码为实用程序组件,非毒性组件或鳞片中心中的因果分量。我们使用涉及模拟医疗方案和现实世界中Compas数据集的一组案例研究来说明量表。实验表明,我们的框架产生了公平的政策,这些政策在单步和顺序决策方案中体现了替代公平原则。
translated by 谷歌翻译
域的概括(DG)旨在在几个源域上学习一个模型,希望该模型能够很好地推广到看不见的目标域。域之间的分布移位包含协变量和条件偏移,模型都必须能够处理以获得更好的推广性。在本文中,提出了一种新颖的DG方法来处理通过视觉对齐和不确定性引导信仰集合(VAUE)的分布转移。具体而言,对于协变性移位,视觉对齐模块的设计旨在使图像样式的分布与常见的经验高斯分布对齐,以便可以在视觉空间中消除协变量移位。对于有条件的转变,我们基于主观逻辑和Dempster-Shafer理论采用了不确定性引导的信念集成策略。给定测试样品的条件分布是通过源域的动态组合估计的。进行了全面的实验,以证明在四个广泛使用的数据集上,即办公室,VLCS,TerrainCognita和PACS上提出的方法的出色性能。
translated by 谷歌翻译